Today's Question:  What does your personal desk look like?        GIVE A SHOUT

SEARCH KEYWORD -- DATA ENGINEERING



  Make Big Data Collection Efficient with Hadoop Architecture and Design Tools

Hadoop architecture and design is popular to spread small array of code to large number of computers. That is why big data collection can be made more efficient with hadoop architecture and design. Hadoop is an open source system where you are free to make changes and design new tools according to your business requirement.   Here we will discuss most popular tools under the category Hadoop development and how they are helpful for big projects. Ambari and Hive– When you are designing...

   HADOOP ARCHITECTURE,HADOOP HIVE ARCHITECTURE,HADOOP ARCHITECTURE AND DESIGN     2015-09-17 05:24:44

  A plugin to update last_error in Delayed Job

delayed_job is a process based asynchronous task processing gem which can be ran at background. It will fork the specified number of processes to execute the tasks asynchronously. The task status is usually stored in the database so that it can be easily integrated into a Rails application where asynchronous job execution is desired. Normally when a job fails to execute or error occurs, it would save the error into the database with the column last_error. Ideally all these will be handled b...

   RUBY,RUBY ON RAILS,DELAYED JOB,LAST_ERROR     2017-11-18 13:05:49

  Introduction to DTLS(Datagram Transport Layer Security)

Secure communication has become a vital requirement on the Internet. Lots of information transferred through the Internet are sensitive data such as financial transactions, medical information, media streaming etc. To ensure security of data transferred on the Internet, a few secure protocols have been designed including SSL/TLS and IPsec. Many large websites in the world have adopted TLS. Apart from SSL/TLS, there is some other protocol designed to be used in special cases. One of them is ...

   JAVA 9,DTLS,TLS,SECURITY     2016-04-02 05:55:36

  Good programmer made bad designers

I got an email request to publish this article a few days ago.I was actually on the verge of moving the email to the trash when I noticed the first name of the author: Rand.For those of you not familiar with the Wheel of Time series, the main character’s name is Rand.I admit that it’s an embarrassing weak reason to respond to a strange email, but reading some 10,000 pages of a fantasy series obviously messes with your mind.Then again, it’s probably no stranger than...

   Programmer,Designer,Comparison     2011-08-29 22:06:33

  What is cache penetration, cache breakdown and cache avalanche?

When designing and developing highly available system, cache is an very important consideration. It is useful to cache some frequently accessed data so that they can be accessed quickly and also cache can protect the downstream system like DB from being hit too often.  To provide better cache design in large systems, some problems may need to be considered first. In this post, we will talk about some frequently discussed cache problems and mitigation plans. Cache penetration Cache penetrati...

   SYSTEM DESIGN,CACHE PENETRATION,CACHE BREAKDOWN,CACHE AVALANCHE     2020-04-10 08:43:00

  Google CEO : Facebook holds its users hostage

Google CEO Larry Page claimed in a media interview recently that it’s unfortunate that Facebook has been pretty closed with their data while Google is in the business of searching dataPage has been attacking Facebook ban on the search engine to search its data. In fact, the battle between the two sides has been going on for several years, and in June 2011 Google launched its social networking service Google+ which somehow further exacerbated the tension. On Monday, Page, in an interview...

   Google,Facebook,Hostage,Larry Page     2012-05-23 05:58:07

  Why should we drop or reduce use of MD5?

MD5 is a frequently used one-way hash algorithm, it is commonly used in following situations: Check data integrity. We take hash of the data stored in two different places and compare them. If the hash results are the same, then there is no need to check the actual data. This utilizes the collision-resistant feature. Two different data block will have little chance that their hash values will be the same. Many data service providers use such technique to check repeated data to avoild repeating...

   MD5,Vulnerability,attack     2012-09-29 04:47:49

  Smuggling data in pointers

While reading up on The ABA Problem I came across a fantastic hack.  The ABA problem, in a nutshell, results from the inability to atomically access both a pointer and a "marked" bit at the same time (read the wikipedia page).  One fun, but very hackish solution is to "smuggle" data in a pointer.  Example:#include "stdio.h"void * smuggle(void * ptr, int value){  return (void *)( (long long)ptr | (value & 3) );}int recoverData(void * ptr){  return (long long)ptr &...

   C,Pointer,Bit,Data,Atomic,Smuggle     2011-11-14 08:15:59

  Hologres vs AWS Redshift

Hologres and Redshift are both data warehousing solutions, but they have some differences in terms of features, architecture, and target use cases. Underlying Infrastructure Hologres: Built on Alibaba Cloud's Apsara distributed computing platform, Hologres leverages the underlying infrastructure for storage, computation, and management. It benefits from Alibaba's expertise in cloud-native architecture and real-time data processing. Redshift: Amazon Redshift is based on a Massively Parallel Pro...

   HOLOGRES,REDSHIFT,ALIBABA,AWS,BIG DATA,REAL-TIME     2024-03-23 01:36:41

  Using C for a specialized data store

Pixenomics stores and transports 1.2 million pixels from the server to the client. During development we played with various methods to store and process this. Our ultimate goal was to send the entire board in under 1 second. During the stages of prototyping we used a MySQL database without thinking too much about performance. With a mere 2,000 pixels we quickly realised this wasn’t even usable as a demo. Changing the storage engine to memory was much better but still obviously unu...

   C,Data store,Efficiency,Performance     2012-03-07 05:09:38